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As of July 20, 2023, the Al classifier is no longer available due to 


its low rate of accuracy. We are working to incorporate feedback 
and are currently researching more effective provenance 
techniques for text, and have made a commitment to develop and 
deploy mechanisms that enable users to understand if audio or 
visual content is Al-generated. 


We've trained a classifier to distinguish between text written by a 
human and text written by Als from a variety of providers. While it 
is impossible to reliably detect all Al-written text, we believe good 
classifiers can inform mitigations for false claims that Al-generated 
text was written by a human: for example, running automated 
misinformation campaigns, using Al tools for academic dishonesty, 


and positioning an Al chatbot as a human. 


Our classifier is not fully reliable. In our evaluations on a 
“challenge set” of English texts, our classifier correctly identifies 
26% of Al-written text (true positives) as “likely Al-written,” while 
incorrectly labeling human-written text as Al-written 9% of the 
time (false positives). Our classifier’s reliability typically improves 
as the length of the input text increases. Compared to 
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our previously released classifier, this new classifier is significantly 
more reliable on text from more recent Al systems. 


We're making this classifier publicly available to get feedback on 
whether imperfect tools like this one are useful. Our work on the 
detection of Al-generated text will continue, and we hope to share 
improved methods in the future. 


Try our free work-in-progress classifier yourself: 
Limitations 


Our classifier has a number of important limitations. It should not 
be used as a primary decision-making tool, but instead as a 
complement to other methods of determining the source of a piece 
of text. 


. The classifier is very unreliable on short texts (below 1,000 


characters). Even longer texts are sometimes incorrectly labeled 
by the classifier. 


. Sometimes human-written text will be incorrectly but confidently 


labeled as Al-written by our classifier. 


. We recommend using the classifier only for English text. It 


performs significantly worse in other languages and it is unreliable 
on code. 


. Text that is very predictable cannot be reliably identified. For 


example, it is impossible to predict whether a list of the first 1,000 
prime numbers was written by Al or humans, because the correct 
answer is always the same. 


. Al-written text can be edited to evade the classifier. Classifiers like 


ours can be updated and retrained based on successful attacks, 
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but it is unclear whether detection has an advantage in the long- 
term. 


. Classifiers based on neural networks are known to be poorly 


calibrated outside of their training data. For inputs that are very 
different from text in our training set, the classifier is sometimes 
extremely confident in a wrong prediction. 


Training the classifier 


Our classifier is a language model fine-tuned on a dataset of pairs 
of human-written text and Al-written text on the same topic. We 
collected this dataset from a variety of sources that we believe to 
be written by humans, such as the pretraining data and human 
demonstrations on prompts submitted to InstructGPT. We divided 


each text into a prompt and a response. On these prompts we 
generated responses from a variety of different language models 
trained by us and other organizations. For our web app, we adjust 
the confidence threshold to keep the false positive rate low; in 
other words, we only mark text as likely Al-written if the classifier is 
very confident. 


Impact on educators and call for input 


We recognize that identifying Al-written text has been an important 
point of discussion among educators, and equally important is 
recognizing the limits and impacts of Al generated text classifiers 
in the classroom. We have developed a preliminary resource on 


the use of ChatGPT for educators, which outlines some of the 
uses and associated limitations and considerations. While this 
resource is focused on educators, we expect our classifier and 
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associated classifier tools to have an impact on journalists, mis/dis- 
information researchers, and other groups. 


We are engaging with educators in the United States to learn what 
they are seeing in their classrooms and to discuss ChatGPT’s 
capabilities and limitations, and we will continue to broaden our 
outreach as we learn. These are important conversations to have 
as part of our mission is to deploy large language models safely, in 
direct contact with affected communities. 


If you’re directly impacted by these issues (including but not limited 
to teachers, administrators, parents, students, and education 
service providers), please provide us with feedback using this 
form. Direct feedback on the preliminary resource is helpful, and 
we also welcome any resources that educators are developing or 
have found helpful (e.g., course guidelines, honor code and policy 
updates, interactive tools, Al literacy programs). 
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